    Identification of a new family of putative PD-(D/E)XK nucleases with unusual phylogenomic distribution and a new type of the active site

    BACKGROUND: Prediction of structure and function for uncharacterized protein families by identification of evolutionary links to characterized families and known structures is one of the cornerstones of genomics. Theoretical assignment of three-dimensional folds and prediction of protein function even at a very general level can facilitate the experimental determination of the molecular mechanism of action and the role that members of a given protein family fulfill in the cell. Here, we predict the three-dimensional fold and study the phylogenomic distribution of members of a large family of uncharacterized proteins classified in the Clusters of Orthologous Groups database as COG4636. RESULTS: Using protein fold-recognition we found that members of COG4636 are remotely related to Holliday junction resolvases and other nucleases from the PD-(D/E)XK superfamily. Structure modeling and sequence analyses suggest that most members of COG4636 exhibit a new, unusual variant of the putative active site, in which the catalytic Lys residue migrated in the sequence, but retained similar spatial position with respect to other functionally important residues. Sequence analyses revealed that members of COG4636 and their homologs are found mainly in Cyanobacteria, but also in other bacterial phyla. They undergo horizontal transfer and extensive proliferation in the colonized genomes; for instance in Gloeobacter violaceus PCC 7421 they comprise over 2% of all protein-encoding genes. Thus, members of COG4636 appear to be a new type of selfish genetic elements, which may fulfill an important role in the genome dynamics of Cyanobacteria and other species they invaded. Our analyses provide a platform for experimental determination of the molecular and cellular function of members of this large protein family. CONCLUSION: After submission of this manuscript, a crystal structure of one of the COG4636 members was released in the Protein Data Bank (code 1wdj; Idaka, M., Wada, T., Murayama, K., Terada, T., Kuramitsu, S., Shirouzu, M., Yokoyama, S.: Crystal structure of Tt1808 from Thermus thermophilus Hb8, to be published). Our analysis of the Tt1808 structure reveals that we correctly predicted all functionally important features of the COG4636 family, including the membership in the PD-(D/E)xK superfamily of nucleases, the three-dimensional fold, the putative catalytic residues, and the unusual configuration of the active site

    DARS-RNP and QUASI-RNP: New statistical potentials for protein-RNA docking

    <p>Abstract</p> <p>Background</p> <p>Protein-RNA interactions play fundamental roles in many biological processes. Understanding the molecular mechanism of protein-RNA recognition and formation of protein-RNA complexes is a major challenge in structural biology. Unfortunately, the experimental determination of protein-RNA complexes is tedious and difficult, both by X-ray crystallography and NMR. For many interacting proteins and RNAs the individual structures are available, enabling computational prediction of complex structures by computational docking. However, methods for protein-RNA docking remain scarce, in particular in comparison to the numerous methods for protein-protein docking.</p> <p>Results</p> <p>We developed two medium-resolution, knowledge-based potentials for scoring protein-RNA models obtained by docking: the quasi-chemical potential (QUASI-RNP) and the Decoys As the Reference State potential (DARS-RNP). Both potentials use a coarse-grained representation for both RNA and protein molecules and are capable of dealing with RNA structures with posttranscriptionally modified residues. We compared the discriminative power of DARS-RNP and QUASI-RNP for selecting rigid-body docking poses with the potentials previously developed by the Varani and Fernandez groups.</p> <p>Conclusions</p> <p>In both bound and unbound docking tests, DARS-RNP showed the highest ability to identify native-like structures. Python implementations of DARS-RNP and QUASI-RNP are freely available for download at <url>http://iimcb.genesilico.pl/RNP/</url></p

    RNA:(guanine-N2) methyltransferases RsmC/RsmD and their homologs revisited – bioinformatic analysis and prediction of the active site based on the uncharacterized Mj0882 protein structure

    BACKGROUND: Escherichia coli guanine-N2 (m(2)G) methyltransferases (MTases) RsmC and RsmD modify nucleosides G1207 and G966 of 16S rRNA. They possess a common MTase domain in the C-terminus and a variable region in the N-terminus. Their C-terminal domain is related to the YbiN family of hypothetical MTases, but nothing is known about the structure or function of the N-terminal domain. RESULTS: Using a combination of sequence database searches and fold recognition methods it has been demonstrated that the N-termini of RsmC and RsmD are related to each other and that they represent a "degenerated" version of the C-terminal MTase domain. Novel members of the YbiN family from Archaea and Eukaryota were also indentified. It is inferred that YbiN and both domains of RsmC and RsmD are closely related to a family of putative MTases from Gram-positive bacteria and Archaea, typified by the Mj0882 protein from M. jannaschii (1dus in PDB). Based on the results of sequence analysis and structure prediction, the residues involved in cofactor binding, target recognition and catalysis were identified, and the mechanism of the guanine-N2 methyltransfer reaction was proposed. CONCLUSIONS: Using the known Mj0882 structure, a comprehensive analysis of sequence-structure-function relationships in the family of genuine and putative m(2)G MTases was performed. The results provide novel insight into the mechanism of m(2)G methylation and will serve as a platform for experimental analysis of numerous uncharacterized N-MTases

    Conserved Amino Acids in Each Subunit of the Heteroligomeric tRNA m\u3csup\u3e1\u3c/sup\u3eA58 Mtase from \u3cem\u3eSaccharomyces cerevisiae\u3c/em\u3e Contribute to tRNA Binding

    In Saccharomyces cerevisiae, a two-subunit methyltransferase (Mtase) encoded by the essential genes TRM6 and TRM61 is responsible for the formation of 1-methyladenosine, a modified nucleoside found at position 58 in tRNA that is critical for the stability of . The crystal structure of the homotetrameric m1A58 tRNA Mtase from Mycobacterium tuberculosis, TrmI, has been solved and was used as a template to build a model of the yeast m1A58 tRNA Mtase heterotetramer. We altered amino acids in TRM6 and TRM61 that were predicted to be important for the stability of the heteroligomer based on this model. Yeast strains expressing trm6 and trm61 mutants exhibited growth phenotypes indicative of reduced m1A formation. In addition, recombinant mutant enzymes had reduced in vitro Mtase activity. We demonstrate that the mutations introduced do not prevent heteroligomer formation and do not disrupt binding of the cofactor S-adenosyl-l-methionine. Instead, amino acid substitutions in either Trm6p or Trm61p destroy the ability of the yeast m1A58 tRNA Mtase to bind , indicating that each subunit contributes to tRNA binding and suggesting a structural alteration of the substrate-binding pocket occurs when these mutations are present

    The PD-(D/E)XK superfamily revisited: identification of new members among proteins involved in DNA metabolism and functional predictions for domains of (hitherto) unknown function

    BACKGROUND: The PD-(D/E)XK nuclease superfamily, initially identified in type II restriction endonucleases and later in many enzymes involved in DNA recombination and repair, is one of the most challenging targets for protein sequence analysis and structure prediction. Typically, the sequence similarity between these proteins is so low, that most of the relationships between known members of the PD-(D/E)XK superfamily were identified only after the corresponding structures were determined experimentally. Thus, it is tempting to speculate that among the uncharacterized protein families, there are potential nucleases that remain to be discovered, but their identification requires more sensitive tools than traditional PSI-BLAST searches. RESULTS: The low degree of amino acid conservation hampers the possibility of identification of new members of the PD-(D/E)XK superfamily based solely on sequence comparisons to known members. Therefore, we used a recently developed method HHsearch for sensitive detection of remote similarities between protein families represented as profile Hidden Markov Models enhanced by secondary structure. We carried out a comparison of known families of PD-(D/E)XK nucleases to the database comprising the COG and PFAM profiles corresponding to both functionally characterized as well as uncharacterized protein families to detect significant similarities. The initial candidates for new nucleases were subsequently verified by sequence-structure threading, comparative modeling, and identification of potential active site residues. CONCLUSION: In this article, we report identification of the PD-(D/E)XK nuclease domain in numerous proteins implicated in interactions with DNA but with unknown structure and mechanism of action (such as putative recombinase RmuC, DNA competence factor CoiA, a DNA-binding protein SfsA, a large human protein predicted to be a DNA repair enzyme, predicted archaeal transcription regulators, and the head completion protein of phage T4) and in proteins for which no function was assigned to date (such as YhcG, various phage proteins, novel candidates for restriction enzymes). Our results contributes to the reduction of "white spaces" on the sequence-structure-function map of the protein universe and will help to jump-start the experimental characterization of new nucleases, of which many may be of importance for the complete understanding of mechanisms that govern the evolution and stability of the genome

    RIBER/DIBER: a software suite for crystal content analysis in the studies of protein–nucleic acid complexes

    Summary: Co-crystallization experiments of proteins with nucleic acids do not guarantee that both components are present in the crystal. We have previously developed DIBER to predict crystal content when protein and DNA are present in the crystallization mix. Here, we present RIBER, which should be used when protein and RNA are in the crystallization drop. The combined RIBER/DIBER suite builds on machine learning techniques to make reliable, quantitative predictions of crystal content for non-expert users and high-throughput crystallography

    Characterization of the cofactor-binding site in the SPOUT-fold methyltransferases by computational docking of S-adenosylmethionine to three crystal structures

    BACKGROUND: There are several evolutionarily unrelated and structurally dissimilar superfamilies of S-adenosylmethionine (AdoMet)-dependent methyltransferases (MTases). A new superfamily (SPOUT) has been recently characterized on a sequence level and three structures of its members (1gz0, 1ipa, and 1k3r) have been solved. However, none of these structures include the cofactor or the substrate. Due to the strong evolutionary divergence and the paucity of experimental information, no confident predictions of protein-ligand and protein-substrate interactions could be made, which hampered the study of sequence-structure-function relationships in the SPOUT superfamily. RESULTS: We used the computational docking program AutoDock to identify the AdoMet-binding site on the surface of three MTase structures. We analyzed the sequence divergence in two distinct lineages of the SPOUT superfamily in the context of surface features and preferred cofactor binding mode to propose specific function for the conserved residues. CONCLUSION: Our docking analysis has confidently predicted the common AdoMet-binding site in three remotely related proteins structures. In the vicinity of the cofactor-binding site, subfamily-conserved grooves were identified on the protein surface, suggesting location of the target-binding/catalytic site. Functionally important residues were inferred and a general reaction mechanism, involving conformational change of a glycine-rich loop, was proposed

    Phylogenomic analysis of the GIY-YIG nuclease superfamily

    BACKGROUND: The GIY-YIG domain was initially identified in homing endonucleases and later in other selfish mobile genetic elements (including restriction enzymes and non-LTR retrotransposons) and in enzymes involved in DNA repair and recombination. However, to date no systematic search for novel members of the GIY-YIG superfamily or comparative analysis of these enzymes has been reported. RESULTS: We carried out database searches to identify all members of known GIY-YIG nuclease families. Multiple sequence alignments together with predicted secondary structures of identified families were represented as Hidden Markov Models (HMM) and compared by the HHsearch method to the uncharacterized protein families gathered in the COG, KOG, and PFAM databases. This analysis allowed for extending the GIY-YIG superfamily to include members of COG3680 and a number of proteins not classified in COGs and to predict that these proteins may function as nucleases, potentially involved in DNA recombination and/or repair. Finally, all old and new members of the GIY-YIG superfamily were compared and analyzed to infer the phylogenetic tree. CONCLUSION: An evolutionary classification of the GIY-YIG superfamily is presented for the very first time, along with the structural annotation of all (sub)families. It provides a comprehensive picture of sequence-structure-function relationships in this superfamily of nucleases, which will help to design experiments to study the mechanism of action of known members (especially the uncharacterized ones) and will facilitate the prediction of function for the newly discovered ones